Towards Accurate Probabilistic Lexicons for Lexicalized Grammars
نویسنده
چکیده
This paper proposes a method of constructing an accurate probabilistic subcategorization (SCF) lexicon for a lexicalized grammar extracted from a treebank. We employ a latent variable model to smooth co-occurrence probabilities between verbs and SCF types in the extracted lexicalized grammar. We applied our method to a verb SCF lexicon of an HPSG grammar acquired from the Penn Treebank. Experimental results show that probabilistic SCF lexicons obtained by our model achieved a lower test-set perplexity against ones obtained by a naive smoothing model using twice as large training data.
منابع مشابه
An Empirical Evaluation of Probabilistic Lexicalized Tree Insertion Grammars
We present an empirical study of the applicability of Probabilistic Lexicalized Tree Insertion Grammars (PLTIG), a lexicalized counterpart to Probabilistic Context-Free Grammars (PCFG), to problems in stochastic naturallanguage processing. Comparing the performance of PLTIGs with non-hierarchicalN -gram models and PCFGs, we show that PLTIG combines the best aspects of both, with language modeli...
متن کاملA model of syntactic disambiguation based on lexicalized grammars
This paper presents a new approach to syntactic disambiguation based on lexicalized grammars. While existing disambiguation models decompose the probability of parsing results into that of primitive dependencies of two words, our model selects the most probable parsing result from a set of candidates allowed by a lexicalized grammar. Since parsing results given by the lexicalized grammar cannot...
متن کاملBilexical Grammars and a Cubic-time Probabilistic Parser
Computational linguistics has a long tradition of lexicalized grammars, in which each grammatical rule is specialized for some individual word. The earliest lexicalized rules were word-specific subcategorization frames. It is now common to find fully lexicalized versions of many grammatical formalisms, such as context-free and tree-adjoining grammars [Schabes et al. 1988]. Other formalisms, suc...
متن کاملLexical Generalization in CCG Grammar Induction for Semantic Parsing
We consider the problem of learning factored probabilistic CCG grammars for semantic parsing from data containing sentences paired with logical-form meaning representations. Traditional CCG lexicons list lexical items that pair words and phrases with syntactic and semantic content. Such lexicons can be inefficient when words appear repeatedly with closely related lexical content. In this paper,...
متن کاملLexicalized TAGs, Parsing and Lexicons
In our approach, each elementary structure is systematically associated with a lexical head. These structures specify extended domains of locality (as compared to a context-free grammar) over which constraints can be stated. These constraints either hold within the elementary structure itself or specify what other structures can be composed with a given elementary structure. The 'grammar' consi...
متن کامل